17. Exploring Population Metadata Exercise

Exploring Population Metadata Exercise

You've now learned how to explore individual images and their associated data as you prepare them for machine learning. The other important aspect of EDA is exploring your population. In this exercise, you'll be given a dataframe that describes a large dataset. Your goal is to perform EDA on the population in the dataset such that you can answer the following questions:

  • How are the different diseases distributed in my dataset in terms of frequency and co-occurrence with one another? (For the sake of time, just choose one of the diseases and assess its co-occurrence frequencies with all other diseases.)
  • How is age distributed across my dataset? Is it distributed differently for different diseases?
  • How is sex distributed across my dataset? Is it distributed differently for different diseases?
  • For findings that have a Mass_size (i.e. not just a binary classification of disease presence) is there a relationship between size and age, sex, or presence of other diseases?

Code

If you need a code on the https://github.com/udacity.